DNA sequencing and parametric deconvolution
نویسنده
چکیده
One of the key practices of the Human genome project is Sanger DNA sequencing. Its data analysis part is called base-calling, which attempts to reconstruct target DNA sequences from fluorescence intensities generated by sequencing machines. In this paper, we present our modeling framework of DNA sequencing, in which a base-calling scheme arises naturally. A large portion of DNA sequencing errors come from the diffusion effect in electrophoresis, and deconvolution is the tool to solve this problem. We present a new version of the parametric deconvolution which is motivated by the spike-convolution model, and some recently obtained results regarding its asymptotics. One application of the asymptotics is to look at the resolution issue from the perspective of confidence intervals. We also report on an empirical study of the progressiveness of electrophoretic diffusion by way of estimating the slowly-changing width parameter in the spike-convolution model. Furthermore, we include an example of complete preprocessing of DNA sequencing data. Running title: DNA sequencing and parametric deconvolution
منابع مشابه
Parametric deconvolution of positive spike trains
This paper describes a parametric deconvolution method (PDPS) appropriate for a particular class of signals which we call spike-convolution models. These models arise when a sparse spike train|Dirac deltas according to our mathematical treatment|is convolved with a xed point-spread function, and additive noise or measurement error is superimposed. We view deconvolution as an estimation problem,...
متن کاملDeconvolution of sparse positive spikes
Deconvolution is usually regarded as one of the ill-posed problems in applied mathematics if no constraints on the unknowns are assumed. In this paper, we discuss the idea of welldefined statistical models being a counterpart of the notion of well-posedness. We show that constraints on the unknowns such as positivity and sparsity can go a long way towards overcoming the ill-posedness in deconvo...
متن کاملIterative Deconvolution for Automatic Basecalling of the Dna Electrophoresis Time Series
In DNA (deoxyribonucleic acid) sequencing, there are four possible chemical base types: adenine (A), cytosine (C), guanine (G), thymine (T), which contain genetic information. The four base types are identified by examining four DNA electrophoresis time series. This procedure is called “basecalling”. However, in practice, there are many other undesired signal features that prevent the accurate ...
متن کاملParametric deconvolution of positive spike
This paper describes a parametric deconvolution method (PDPS) appropriate for a particular class of signals which we call spike-convolution models. These models arise when a sparse spike train|Dirac deltas according to our mathematical treatment|is convolved with a xed point-spread function, and additive noise or measurement error is superimposed. We view deconvolution as an estimation problem,...
متن کاملDeconvolution of Sparse Positive Spikes: Is It Ill-posed?
Deconvolution is usually regarded as one of the so called ill-posed problems of applied mathematics if no constraints on the unknowns can be assumed. In this paper, we discuss the idea of well-de ned statistical models being a counterpart of the notion of well-posedness. We show that constraints on the unknowns such as non-negativity and sparsity can help a great deal to get over the inherent i...
متن کامل